{ "metadata": { "name": "IGV Oly project" }, "name": "IGV Oly project", "nbformat": 2, "worksheets": [ { "cells": [ { "cell_type": "markdown", "source": "#Create BAM files from Paired End sequencing of Oly Samples" }, { "cell_type": "markdown", "source": "###To create BAM file, Search for TopHat 2 PE (Paired End) App\n![iplant][oys1]\n[oys1]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/iPlantAppTopHat2.jpg" }, { "cell_type": "markdown", "source": "###Input left and right reads, read all files together not seperately\n![iplant][oys2]\n[oys2]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/iPlantAppTopHat2Setup1.jpg" }, { "cell_type": "markdown", "source": "###Finally add the reference genome from the OlyO_Pat_v02.fa file to create PE Bam\n![iplant][oys3]\n[oys3]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/iPlantAppTopHatSE3.jpg" }, { "cell_type": "markdown", "source": "#Create BAM files from Single End sequencing of Oly Larvae" }, { "cell_type": "markdown", "source": "###To create Bam for Single End larvae Samples, we do basically the same thing except use TopHat 2 SE.\n![iplant][lar1]\n[lar1]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/iPlantAppTopHatSE1.jpg" }, { "cell_type": "markdown", "source": "![iplant][lar2]\n[lar2]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/iPlantAppTopHatSE2.jpg" }, { "cell_type": "markdown", "source": "![iplant][lar3]\n[lar3]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/iPlantAppTopHatSE3.jpg" }, { "cell_type": "markdown", "source": "#IGV Browser" }, { "cell_type": "markdown", "source": "###I created a new genome file using the OlyO_Pat_v02 fasta for IGV \n###To create publically accessible sessions, this genome must be hosted on a public server\n![IGVimages][IGVgene]\n[IGVgene]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVGenomebuild10Kcontigs.jpg" }, { "cell_type": "markdown", "source": "###To Make a public session, BAM files must be uploaded via URL to IGV from a publicly hosted server.\n![IGVimages][IGVweb1]\n[IGVweb1]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVWebSetup1.jpg" }, { "cell_type": "markdown", "source": "###I imported the bams files for the olys and then overlayed them on top of each other. \n###There is still very little coverage." }, { "cell_type": "markdown", "source": "![IGVimages][IGVcompare1]\n[IGVcompare1]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVv02MFcompare1.jpg" }, { "cell_type": "markdown", "source": "#Create GFF annotation track for Oly" }, { "cell_type": "markdown", "source": "###The next hurdle is trying to annotate the genome browser to show exons. Using the method below\n###I was able to Blast the V2 Transcriptome to the V2 oly genome to create the tabulated output. " }, { "cell_type": "code", "collapsed": false, "input": "cd BLAST/bin/olyblast", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake/BLAST/bin/olyblast" } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "OlyCon2OlyTrans.tab olytransv2.nhr Olyv02.nin\nOlycontigstoOlyTrans.tab olytransv2.nin Olyv02.nsq\nOlyO10kcontigs_exon_a.gff olytransv2.nsq TJGS_Olurida_transcriptome_v2_2.fa\nOlyO_Pat_v02.fa Olyv02.gff\nOlyTrans2OlyScaf.tab Olyv02.nhr" } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": "cd ../", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake/BLAST/bin" } ], "prompt_number": 6 }, { "cell_type": "markdown", "source": "###Prior to running the blastn I made a db using the command \n\n###makeblastdb -in OlyO_Pat_v02.fa -dbtype nucl -out Olyv02" }, { "cell_type": "code", "collapsed": true, "input": "!blastn -query olyblast/TJGS_Olurida_transcriptome_v2_2.fa -db olyblast/Olyv02 -out olyblast/otbnos.tab -evalue 1E-20 -outfmt 6", "language": "python", "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "\u001b[0m\u001b[01;32mblastdb_aliastool\u001b[0m* \u001b[01;32mblastx\u001b[0m* \u001b[01;32mmakembindex\u001b[0m* \u001b[01;32msegmasker\u001b[0m*\n\u001b[01;32mblastdbcheck\u001b[0m* \u001b[01;32mconvert2blastmask\u001b[0m* \u001b[01;32mmakeprofiledb\u001b[0m* \u001b[01;32mtblastn\u001b[0m*\n\u001b[01;32mblastdbcmd\u001b[0m* \u001b[01;32mdeltablast\u001b[0m* \u001b[01;34molyblast\u001b[0m/ \u001b[01;32mtblastx\u001b[0m*\n\u001b[01;32mblast_formatter\u001b[0m* \u001b[01;32mdustmasker\u001b[0m* \u001b[01;32mpsiblast\u001b[0m* \u001b[01;32mupdate_blastdb.pl\u001b[0m*\n\u001b[01;32mblastn\u001b[0m* \u001b[01;32mlegacy_blast.pl\u001b[0m* \u001b[01;32mrpsblast\u001b[0m* \u001b[01;32mwindowmasker\u001b[0m*\n\u001b[01;32mblastp\u001b[0m* \u001b[01;32mmakeblastdb\u001b[0m* \u001b[01;32mrpstblastn\u001b[0m*" } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": "cd olyblast", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake/BLAST/bin/olyblast" }, { "output_type": "stream", "stream": "stdout", "text": "" } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "OlyCon2OlyTrans.tab olytransv2.nhr Olyv02.nin\nOlycontigstoOlyTrans.tab olytransv2.nin Olyv02.nsq\nOlyO10kcontigs_exon_a.gff olytransv2.nsq otbnos.tab\nOlyO_Pat_v02.fa Olyv02.gff TJGS_Olurida_transcriptome_v2_2.fa\nOlyTrans2OlyScaf.tab Olyv02.nhr" } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": "!head -5 otbnos.tab", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_549\t87.16\t7716\t122\t762\t6\t7148\t9554\t2135\t0.0\t7956\nOlurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t86.10\t6915\t116\t736\t873\t7148\t7401\t693\t0.0\t6665\nOlurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t87.76\t1699\t18\t167\t874\t2434\t7475\t9121\t0.0\t1810\nOlurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t85.99\t828\t18\t90\t2747\t3508\t11421\t10626\t0.0\t 797\nOlurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t80.39\t1188\t15\t153\t2441\t3428\t9248\t10417\t0.0\t 702" } ], "prompt_number": 14 }, { "cell_type": "markdown", "source": "#Blast2Gff.pl " }, { "cell_type": "markdown", "source": "At first I tried using a documented Blast2Gff pipeline but this eventually failed" }, { "cell_type": "code", "collapsed": false, "input": "cd ", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake" } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": "cd BioPerl-1.6.1", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake/BioPerl-1.6.1" } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "AUTHORS \u001b[0m\u001b[01;34m_build\u001b[0m/ DEPRECATED INSTALL.SKIP MANIFEST \u001b[01;34mscripts\u001b[0m/\n\u001b[01;34mBio\u001b[0m/ \u001b[01;32mBuild\u001b[0m* \u001b[01;34mdoc\u001b[0m/ INSTALL.WIN MANIFEST.SKIP \u001b[01;34mt\u001b[0m/\nBioPerl.pm \u001b[01;32mBuild.PL\u001b[0m* \u001b[01;34mexamples\u001b[0m/ LICENSE META.yml\n\u001b[01;34mblib\u001b[0m/ Changes \u001b[01;34mide\u001b[0m/ \u001b[01;34mmaintenance\u001b[0m/ \u001b[01;34mmodels\u001b[0m/\nBUGS DEPENDENCIES INSTALL Makefile.PL README" } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": "cd scripts", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake/BioPerl-1.6.1/scripts" } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "\u001b[0m\u001b[01;34mbiblio\u001b[0m/ blast2gff.pl \u001b[01;34mindex\u001b[0m/ \u001b[01;34mseq\u001b[0m/ \u001b[01;34mutilities\u001b[0m/\n\u001b[01;34mBio-DB-EUtilities\u001b[0m/ \u001b[01;34mdas\u001b[0m/ \u001b[01;34mpopgen\u001b[0m/ \u001b[01;34mseqstats\u001b[0m/\n\u001b[01;34mBio-DB-GFF\u001b[0m/ \u001b[01;34mDB\u001b[0m/ \u001b[01;32mREADME\u001b[0m* \u001b[01;34mtaxa\u001b[0m/\n\u001b[01;34mBio-SeqFeature-Store\u001b[0m/ \u001b[01;34mDB-HIV\u001b[0m/ \u001b[01;34msearchio\u001b[0m/ \u001b[01;34mtree\u001b[0m/" } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": "!perl blast2gff.pl --blast_result_file ~/BLAST/bin/olyblast/otbnos.tab --reference_sequence_file ~/BLAST/bin/olyblast/OlyO_Pat_v02.fa --gff_output_file ~/BLAST/bin/olyblast/Oly2.gff", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "Replacement list is longer than search list at /usr/local/share/perl/5.14.2/Bio/Range.pm line 251." } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": "cd ", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake" } ], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": "cd BLAST/bin/olyblast", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake/BLAST/bin/olyblast" } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "Oly2.gff olytransv2.nsq\nOlyCon2OlyTrans.tab Olyv02.gff\nOlycontigstoOlyTrans.tab Olyv02.nhr\nOlyO10kcontigs_exon_a.gff Olyv02.nin\nOlyO_Pat_v02.fa Olyv02.nsq\nOlyTrans2OlyScaf.tab otbnos.tab\nolytransv2.nhr TJGS_Olurida_transcriptome_v2_2.fa\nolytransv2.nin" } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": "!head -5 Oly2.gff", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "OlyO_Pat_PacBio_1_contig_1\tsequence\tsequence\t1\t15049\t.\t+\t.\tSequence OlyO_Pat_PacBio_1_contig_1\nOlyO_Pat_PacBio_1_contig_2\tsequence\tsequence\t1\t16946\t.\t+\t.\tSequence OlyO_Pat_PacBio_1_contig_2\nOlyO_Pat_PacBio_1_contig_3\tsequence\tsequence\t1\t15665\t.\t+\t.\tSequence OlyO_Pat_PacBio_1_contig_3\nOlyO_Pat_PacBio_1_contig_4\tsequence\tsequence\t1\t6353\t.\t+\t.\tSequence OlyO_Pat_PacBio_1_contig_4\nOlyO_Pat_PacBio_1_contig_5\tsequence\tsequence\t1\t16523\t.\t+\t.\tSequence OlyO_Pat_PacBio_1_contig_5" } ], "prompt_number": 27 }, { "cell_type": "markdown", "source": "###After importing this file into IGV, I can see that the pipeline failed to annotate which failed\n###to produce any segment specific outputs\n" }, { "cell_type": "markdown", "source": "![IGVimages][IGVfail]\n[IGVfail]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVv02MFcompare10.jpg" }, { "cell_type": "markdown", "source": "#2_Blast2Gff.pl" }, { "cell_type": "markdown", "source": "###Steven showed me his modified Blast2Gff file which actually works on my output. I used this to create the annotation file for the IGV" }, { "cell_type": "code", "collapsed": false, "input": "cd", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake" } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": "cd BioPerl-1.6.1/scripts", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake/BioPerl-1.6.1/scripts" } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "\u001b[0m\u001b[01;34mbiblio\u001b[0m/ blast2gff.pl \u001b[01;34mindex\u001b[0m/ \u001b[01;34mseq\u001b[0m/ \u001b[01;34mutilities\u001b[0m/\n\u001b[01;34mBio-DB-EUtilities\u001b[0m/ \u001b[01;34mdas\u001b[0m/ \u001b[01;34mpopgen\u001b[0m/ \u001b[01;34mseqstats\u001b[0m/\n\u001b[01;34mBio-DB-GFF\u001b[0m/ \u001b[01;34mDB\u001b[0m/ \u001b[01;32mREADME\u001b[0m* \u001b[01;34mtaxa\u001b[0m/\n\u001b[01;34mBio-SeqFeature-Store\u001b[0m/ \u001b[01;34mDB-HIV\u001b[0m/ \u001b[01;34msearchio\u001b[0m/ \u001b[01;34mtree\u001b[0m/" } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": "!wget https://raw.github.com/sr320/fish546/master/2_Blast2Gff.pl", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "--2014-03-13 14:52:02-- https://raw.github.com/sr320/fish546/master/2_Blast2Gff.pl\nResolving raw.github.com (raw.github.com)... 199.27.77.133\nConnecting to raw.github.com (raw.github.com)|199.27.77.133|:443... connected." }, { "output_type": "stream", "stream": "stdout", "text": "HTTP request sent, awaiting response... " }, { "output_type": "stream", "stream": "stdout", "text": "200 OK\nLength: 7427 (7.3K) [text/plain]\nSaving to: `2_Blast2Gff.pl'\n\n\n 0% [ ] 0 --.-K/s \n100%[======================================>] 7,427 --.-K/s in 0.001s \n\n2014-03-13 14:52:02 (11.7 MB/s) - `2_Blast2Gff.pl' saved [7427/7427]" } ], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": "cd", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake" }, { "output_type": "stream", "stream": "stdout", "text": "" } ], "prompt_number": 32 }, { "cell_type": "code", "collapsed": false, "input": "cd BLAST/bin/olyblast", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "/home/jake/BLAST/bin/olyblast" } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "Oly2.gff olytransv2.nsq\nOlyCon2OlyTrans.tab Olyv02.gff\nOlycontigstoOlyTrans.tab Olyv02.nhr\nOlyO10kcontigs_exon_a.gff Olyv02.nin\nOlyO_Pat_v02.fa Olyv02.nsq\nOlyTrans2OlyScaf.tab otbnos.tab\nolytransv2.nhr TJGS_Olurida_transcriptome_v2_2.fa\nolytransv2.nin" } ], "prompt_number": 34 }, { "cell_type": "code", "collapsed": true, "input": "!perl ~/BioPerl-1.6.1/scripts/2_Blast2Gff.pl -i otbnos.tab -o Olytranv2_OlyPacBio_v02.gff -d \"OlyOvtran_v2\" -p EXON -s \"something\"", "language": "python", "outputs": [], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": "ls", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "Oly2.gff olytransv2.nsq\nOlyCon2OlyTrans.tab Olytranv2_OlyPacBio_v02.gff\nOlycontigstoOlyTrans.tab Olyv02.gff\nOlyO10kcontigs_exon_a.gff Olyv02.nhr\nOlyO_Pat_v02.fa Olyv02.nin\nOlyTrans2OlyScaf.tab Olyv02.nsq\nolytransv2.nhr otbnos.tab\nolytransv2.nin TJGS_Olurida_transcriptome_v2_2.fa" } ], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": "!stat Olytranv2_OlyPacBio_v02.gff", "language": "python", "outputs": [ { "output_type": "stream", "stream": "stdout", "text": " File: `Olytranv2_OlyPacBio_v02.gff'\n Size: 1142443 \tBlocks: 2232 IO Block: 4096 regular file\nDevice: 805h/2053d\tInode: 3670725 Links: 1" }, { "output_type": "stream", "stream": "stdout", "text": "Access: (0664/-rw-rw-r--) Uid: ( 1000/ jake) Gid: ( 1000/ jake)\nAccess: 2014-03-13 14:57:26.716204805 -0700\nModify: 2014-03-13 14:57:25.656204802 -0700\nChange: 2014-03-13 14:57:25.656204802 -0700\n Birth: -" } ], "prompt_number": 38 }, { "cell_type": "markdown", "source": "### Now that I have an annotation file, I save that to my web server and upload it via url to IGV\n![igvimages][IGVgff]\n[IGVgff]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVWebSetup2.jpg" }, { "cell_type": "markdown", "source": "###This is the IGV session with a few BAM tracks and gff annotation.\n![id][IGVsuccess]\n[IGVsuccess]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVWebSetup3.jpg" }, { "cell_type": "markdown", "source": "###Next we have to save the IGV session to a public hosted webserver to make it accessible\n![idg][IGVsave]\n[IGVsave]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVWebSetup6.jpg" }, { "cell_type": "markdown", "source": "###Now open the XML file and replace the Session Genome = \"EagleOlyv02\"\n![idg][IGVedit1]\n[IGVedit1]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVWebSetup4.jpg" }, { "cell_type": "markdown", "source": "###With the URL to the publicly hosted genome file so it reads \nSession Genome = \"http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGV%20stuff/EagleOlyv02.genome\"\n###This allows the session to load up the appropriate URL instead of a locally hosted genome file. \n![idg][IGVedit2]\n[IGVedit2]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVWebSetup5.jpg" }, { "cell_type": "markdown", "source": "#IGV Final Product\n![IGV][IGVFinal]\n[IGVFinal]: http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGVWebSetup7.jpg" }, { "cell_type": "markdown", "source": "#Link to Session .XML\n[Investigate the Oly Genome from your own computer](http://eagle.fish.washington.edu/dermochelys/Bioinformatics/IGV%20stuff/OlyWebSession.xml \"Jake's Semi-Annotated Old Fashioned Oly Genome\")" } ] } ] }